# Multi‑Agent Bandit Simulation

A lightweight yet extensible playground for **co‑operative stochastic multi‑armed bandits (MAB)**.

---

## Table of Contents

* [Overview](#overview)
* [Environment Settings](#environment-settings)
* [Project Layout](#project-layout)
* [Getting Started](#getting-started)

  * [Installation](#installation)
  * [Quick Run](#quick-run)
* [Experiment Suites](#experiment-suites)

  * [vary\_arm](#vary_arm)
  * [vary\_agent](#vary_agent)
  * [vary\_delta](#vary_delta)
* [Reproducing the Paper Figures](#reproducing-the-paper-figures)
* [Contributing](#contributing)
* [License](#license)
* [Citation](#citation)

---

## Overview

The repository focuses on **decentralised / federated bandit learning** where multiple agents pull arms and optionally share information.  We offer:

1. A **heterogeneous environment** (default in earlier commits) where each agent faces a *different* mean‑reward vector.
2. A **homogeneous environment** (new baseline) where every agent sees the *same* arm means.

Three experiment suites ⏤ `vary_arm`, `vary_agent`, `vary_delta` ⏤ isolate a single factor at a time so you can answer questions like *“how does regret scale with agents under heterogeneity?”* versus *“how does the reward gap Δ affect homogeneous learning?”*

Implemented algorithms (both settings):

| Acronym       | Description                                 | Paper / Idea                      |
| ------------- | ------------------------------------------- | --------------------------------- |
| **Dis-UCB**      | Distributed multi-armed bandits                           |Zhu et.al. 2023 |
| **UCB-TCOM**      | Achieving near-optimal individual regret low communications in multi-agent bandits               | Wang et.al. 2023           |
| **EpoInc-SE**       |                | Our Works              |
| **Fed2-UCB**    | Federated multi-armed bandits | Shi et. al. 2021                 |
| **Gossip_UCB** | Federated bandit: A gossiping approach          | Zhu et. al.  2021       |

---

## Environment Settings

| Setting                     | Arm matrix shape | Mean‑reward interpretation                               |
| --------------------------- | ---------------- | -------------------------------------------------------- |
| **Heterogeneous** (default) | `(K, N)`         | Arm *i* has a different mean for each of the *N* agents. |
| **Homogeneous** (baseline)  | `(K, 1)`         | All agents share the same *K*‑dimensional mean vector.   |

Both settings share identical APIs; switch simply by running the corresponding script (suffix `_homo_*.py` for homogeneous, no suffix for heterogeneous).

---

## Project Layout

```
.
├── vary_arm/                # Use different arms, Agents fixed. Arm shape: (K, N)
│   ├── des
│   │    ├─────── des_mu8.py          # 
│   │    ├─────── …          
│   │    └─────── des_mu16.py         # 
│   ├── ducb
│   │    ├─────── ducb_mu8.py          # 
│   │    ├─────── …          
│   │    └─────── ducb_mu16.py         # 
│   ├── feducb
│   │    ├─────── feducb_mu8.py          # 
│   │    ├─────── …          
│   │    └─────── feducb_mu16.py         # 
│   └── tomf
│        ├─────── tomf_mu8.py          # 
│        ├─────── …          
│        └─────── tomf_mu16.py         # 
├── vary_agent/                # Use different agents, Arms fixed. Arm shape: (K, N)
│   ├── des
│   │    ├─────── des_agent_11.py          # 
│   │    ├─────── des_agent_14.py
│   │    ├─────── des_agent_17.py          
│   │    └─────── des_agent_20.py         # 
│   ├── ducb
│   │    ├─────── ducb_agent_11.py          # 
│   │    ├─────── ducb_agent_14.py
│   │    ├─────── ducb_agent_17.py         
│   │    └─────── ducb_agent_20.py         # 
│   ├── feducb
│   │    ├─────── feducb_agent_11.py          # 
│   │    ├─────── feducb_agent_14.py
│   │    ├─────── feducb_agent_17.py         
│   │    └─────── feducb_agent_20.py         # 
│   ├── gossip
│   │    ├─────── gossip_agent_11.py          # 
│   │    ├─────── gossip_agent_14.py
│   │    ├─────── gossip_agent_17.py         
│   │    └─────── gossip_agent_20.py         # 
│   └── tomf
│        ├─────── tomf_agent_11.py          # 
│        ├─────── tomf_agent_14.py
│        ├─────── tomf_agent_17.py
│        └─────── tomf_agent_20.py
├── vary_delta/              # Using different over Δ (agents & arms fixed, arm shape:(K, N))
│   ├── des
│   │    ├─────── des_delta1.py
│   │    ├─────── des_delta2.py
│   │    ├─────── des_delta3.py          
│   │    └─────── des_delta4.py
│   ├── ducb
│   │    ├─────── ducb_delta10.py 
│   │    ├─────── ducb_delta12.py
│   │    ├─────── ducb_delta20.py          
│   │    ├─────── ducb_delta22.py          
│   │    ├─────── ducb_delta30.py          
│   │    ├─────── ducb_delta32.py          
│   │    ├─────── ducb_delta40.py          
│   │    └─────── ducb_delta42.py
│   ├── feducb
│   │    ├─────── feducb_delta1.py 
│   │    ├─────── feducb_delta2.py
│   │    ├─────── feducb_delta3.py          
│   │    └─────── feducb_delta4.py
│   ├── gossip
│   │    ├─────── gossip_delta1.py
│   │    ├─────── gossip_delta2.py
│   │    ├─────── gossip_delta3.py          
│   │    └─────── gossip_delta4.py
│   └── tomf
│        ├─────── tomf_delta1.py
│        ├─────── tomf_delta2.py
│        ├─────── tomf_delta3.py
│        └─────── tomf_delta4.py
├── homogeneous/                # Agent and Arm fixed. Arm Shape: (K, 1)
│   ├── des.py
│   ├── ducb.py
│   ├── feducb.py
│   ├── tomf.py
│   └── gossip.py
└── plot/                # Plotting utilities (see below)
    ├── group_reg.py         # group regret ↔ time
    ├── ind_reg.py           # individual regret ↔ time
    ├── comm_bits.py         # communication bits ↔ time
    └── utils.py             # shared helpers
```




All scripts embed their own `T = 1,000,000` horizon, so **no CLI parameters** are needed.

## Getting Started

### Installation

```bash
$ conda create -n des python=3.12
$ conda activate des
$ pip install numpy matplotlib tqdm
````

### Quick Run

```bash
# ----- ARMS sweep (heterogeneous) -----
$ python vary_arm/des/des_mu8.py          # 8 arms, hetero


# ----- AGENTS sweep -----
$ python vary_agent/gossip_agent_8.py       # 8 agents, hetero

# ----- Δ sweep -----
$ python vary_delta/feducb_delta2.py        # Δ = 10^{-2}, hetero
```

All runs stream progress to stdout and write logs under `results/<suite>/<script‑name>/` (CSV: regret, pulls, communications).

---

## Experiment Suites

### `vary_arm`

* **Fixed:** agents = 5, Δ = 10^{-1} (unless `delta` suffix present).
* **Variable:** arms ∈ {8, 9, 10, 12, 15}.
* **Environments:** hetero & homo variants available.

### `vary_agent`

* **Fixed:** arms = 8, Δ = 10^{-1}.
* **Variable:** agents ∈ {5, 8, 10}.
* **Environments:** hetero & homo.

### `vary_delta`

* **Fixed:** agents = 5, arms = 8.
* **Variable:** reward‑gap exponent `d` ∈ {1, 2, …} ⇒ Δ = 10^{‑d}.
* **Environments:** hetero & homo.

---




## License

Released under the **MIT License** — see [LICENSE](LICENSE).

